
When starting off with deep learning, one of the first questions to ask is, which framework to learn?

Common choices include Theano, TensorFlow, Torch, and Keras. All of these choices have their own pros and cons and have their own way of doing things.

From The Anatomy of Deep Learning Frameworks

The core components of a deep learning framework we must consider are:

  • How Tensor Objects are defined. At the heart of the framework is the tensor object. A tensor is a generalization of a matrix to n-dimensions. We need a Tensor Object that supports storing the data in form of tensors. Not just that, we would like the object to be able to convert other data types (images, text, video) into tensors and back, supporting indexing, overloading operators, having a space efficient way to store the data and so on.
  • How Operations on the Tensor Object are defined. A neural network can be considered as a series of Operations performed on an input tensor to give an output.
  • The use of a Computation Graph and its Optimizations. Instead of implementing operations as functions, they are usually implemented as classes. This allows us to store more information about the operation like calculated shape of the output (useful for sanity checks), how to compute the gradient or the gradient itself (for the auto-differentiation), have ways to be able to decide whether to compute the op on GPU or CPU and so on. The power of neural networks lies in the ability to chain multiple operations to form a powerful approximator. Therefore, the standard use case is that you can initialize a tensor, perform actions after actions on them and finally interpret the resulting tensor as labels or real values. Unfortunately, as you chain more and more operations together, several issues arise that can drastically slow down your code and introduce bugs as well. There are more such issues and it becomes necessary to be able to get a bigger picture to even notice that these issues exist. We need a way to optimize the resultant chain of operations for both space and time. A Computation Graph which is basically an object that contains links to the instances of various Ops and the relations between which operation takes the output of which operation as well as additional information.
  • The use of Auto-differentiation tools. Another benefit of having the computational graph is that calculating gradients used in the learning phase becomes modular and straightforward to compute.
  • The use of BLAS/cuBLAS and cuDNN extensions for maximizing performance. BLAS or Basic Linear Algebra Subprograms are a collection of optimized matrix operations, initially written in Fortran. These can be leveraged to do very fast matrix (tensor) operations and can provide significant speedups. There are many other software packages like Intel MKL, ATLAS which also perform similar functions. BLAS packages are usually optimized assuming that the instructions will be run on a CPU. In the deep learning situation, this is not the case and BLAS may not be able to fully exploit the parallelism offered by GPGPUs. To solve this issue, NVIDIA has released cuBLAS which is optimized for GPUs. This is now included with the CUDA toolkit.

The computational model for Tensorflow (tf) is a directed graph.

Nodes are functions (operations in tf terminology) and edges are tensors.

Tensor are multidimensional data arrays.

$$f(a,b) = (a*b) + (a+b)$$

There are several reasons for this design:

  • The most important is that is a good way to split up computation into small, easily differentiable pieces. tf uses automatic differentiation to automatically compute the derivative of every node with respect any other node that can affect the first node's output.
  • The grah is also a convenient way for distributing computation across multiple CPUs, GPUs, etc.

The primary API of tf (written in C++) is accessed through Python.

There are different way of installing tf:

  • Pip install: May impact existing Python programs on your machine.
  • Virtualenv install: Install TensorFlow in its own directory, not impacting any existing Python programs on your machine.
  • Anaconda install (Windows: only Python 3, not Python 2.7): Install TensorFlow in its own environment for those running the Anaconda Python distribution. Does not impact existing Python programs on your machine.
  • Docker install: Run TensorFlow in a Docker container isolated from all other programs on your machine. It allows coexisting tf versions.
  • Installing from sources: Install TensorFlow by building a pip wheel that you then install using pip.

Our preferred way is Docker install.


tf computation graphs are described in code with tf API.

In [ ]:
import tensorflow as tf

In [ ]:
# Basic constant operations = to assign a value to a tensor
a = tf.constant(2)
b = tf.constant(3)
c = a+b 
d = a*b
e = c+d

# non interactive session

with tf.Session() as sess:
    print("(a+b)+(a*b) = %i" % sess.run(e))

You can create initialized tensors in many ways:

In [ ]:
a = tf.zeros([2,3], tf.int32)
b = tf.ones([2,3], tf.int32)
c = tf.fill([3,3], 23.9)
d = tf.range(0,10,1)

with tf.Session() as sess:

tf sequences are not iterable!

We can also generate random variables:

In [ ]:
a = tf.random_normal([2,2], 0.0, 1.0)
b = tf.random_uniform([2,2], 0.0, 1.0)

with tf.Session() as sess:

How to generate random shuffled number in tensorflow?

In [ ]:
idx = tf.constant(20)
idx_list = tf.range(idx) # 0~19
shuffle = tf.random_shuffle(idx_list)

with tf.Session() as sess:
    a, b = sess.run([idx_list, shuffle])
    print a
    print b

In [ ]:
# Basic operations with variable graph input

a = tf.placeholder(tf.int16)
b = tf.placeholder(tf.int16)
c = tf.add(a,b) 
d = tf.mul(a,b)
e = tf.add(c,d)

values = feed_dict={a: 5, b: 3}

# non interactive session

with tf.Session() as sess:
    print('a = %i' % sess.run(a, values))
    print('b = %i' % sess.run(b, values))
    print("(a+b)+(a*b) = %i" % sess.run(e, values))

A computational graph is a series of functions chained together, each passing its output to zero, one or more functions further along the chain.

In this way we can construct very complex transformations on data by using a library of simple functions.

Nodes represent some sort of computation beign done in the graph context.

Edges are the actual values (tensors) that get passed to and from nodes.

  • The values flowing into the graph can come from different sources: from a different graph, from a file, entered by the client, etc. The input nodes simply pass on values given to them.
  • The other nodes take values, apply an operation and output their result.

Values running on edges are tensors:

In [ ]:
# Basic operations with variable as graph input

a = tf.placeholder(tf.int16,shape=[2])
b = tf.placeholder(tf.int16,shape=[2])
c = tf.add(a,b) 
d = tf.mul(a,b)
e = tf.add(c,d)

variables = feed_dict={a: [2,2], b: [3,3]}

# non interactive session

with tf.Session() as sess:
    print(sess.run(a, variables))
    print(sess.run(b, variables))
    print(sess.run(e, variables))


Implement this computational graph:

In [ ]:
# your code here

There are certein connections between nodes that are not allowed: you cannot create circular dependencies.

Dependency: Any node A that is required for the computation of a later node B is said to be a dependency of B.

The main reason is that dependencies create endless feedback loops.

There is one exception to this rule: recurrent neural networks. In this case tf simulate this kind of dependences by copying a finite number of versions of the graph, placing them side-by-side, and feeding them into another sequence. This process is referred as unrolling the graph.

Keeping track of dependencies is a basic feature of tf. Let's suppose that we want to compute the output value of the mul node. We can see in the unrolled graph that is not necessary to compute the full graph to get the output of that node. But how to ensure that we only compute the necessary nodes?

It's pretty easy:

  • Build a list for each node with all nodes it directly depends on (not indirectly!).
  • Initialize an empty stack, wich will eventually hold all the nodes we want to compute.
  • Put the node you want to get the output from.
  • Recursively, look at its dependency list and add to the stack the nodes it depends on, until there are no dependencies left to run and in this way we guarantee that we have all the nodes we need.

The stack will be ordered in a way that we are guaranteed to be able to run each node in the stack as we iterate through it.

The main thing to look out for is to keep track of nodes that were already computed and to store their value in memory.

As we have seen in previous code, tf workflow is a two-step process:

  • Define the computation graph.
  • Run the graph with data.

In [ ]:
# graph definition
# we can assign a name to every node

a = tf.placeholder(tf.int32, name='input_a')
b = tf.placeholder(tf.int32, name='input_b')
c = tf.add(a,b,name='add_1') 
d = tf.mul(a,b,name='mul_1')
e = tf.add(c,d,name='add_2')

values = feed_dict={a: 5, b: 3}

# now we can run the graph in an interactive session

sess = tf.Session()
print(sess.run(e, values))

tf has a very useful tool: tensor-board. Let's see how to use it.

In [ ]:
# cleaning the tf graph space

a = tf.placeholder(tf.int16, name='input_a')
b = tf.placeholder(tf.int16, name='input_b')
c = tf.add(a,b,name='add_1') 
d = tf.mul(a,b,name='mul_1')
e = tf.add(c,d,name='add_2')

values = feed_dict={a: 5, b: 3}

# now we can run the graph

# graphs are run by invoking Session objects
session = tf.Session()

# when you are passing an operation to 'run' you are 
# asking to run all operations necessary to compute that node

# you can save the value of the node in a Python var
output = session.run(e, values)

# now let's visualize the graph

# SummaryWriter is an object where we can save information
# about the execution of the computational graph
writer = tf.train.SummaryWriter('my_graph', session.graph)

# closing interactive session
print output

Open a terminal and type in:

tensorboard --logdir="my_graph"

This starts a tensorboard server on port 6006. There, click on the Graphs link. You can see that each of the nodes is labeled based on the name parameter you passed into each operation.


Implement and visualize this graph for a constant tensor [5,3]:

Check these functions in the tf official documentation (https://www.tensorflow.org/): tf.reduce_prod, tf.reduce_sum.

In [ ]:
# your code here

tf input data

tf can take several Python var types that are automatically converted to tensors:

tf.constant([5,3], name='input_a')

But tf has a plethora of other data types: tf_int16, tf_quint8, etc.

tf is tightly integrated with NumPy. In fact, tf data types are based on those from NumPy. Tensors returned from Session.run are NumPy arrays. NumPy arrays is the recommended way of specifying tensors.

The shape of tensors describe both the number of dimensions in a tensor as well as the length of each dimension. In addition to to being able to specify fixed lengths to each dimension, you can also assign a flexible length by passing in None as dimension's value.

In [ ]:
import tensorflow as tf
import numpy as np


a = tf.placeholder(tf.int16, shape=[2,2], name='input_a')
shape = tf.shape(a)

session = tf.Session()

We can feed data points to placeholder by iterating through the data set:

In [ ]:

list_a_values = [1,2,3]

a = tf.placeholder(tf.int16)
b = a * 2

with tf.Session() as sess:
    for a_value in list_a_values:
        print(sess.run(b,{a: a_value}))

tf operations

tf overloads common mathematical operations:

In [ ]:
import tensorflow as tf


a = tf.placeholder(tf.int16)
b = tf.placeholder(tf.int16)
c = a+b 
d = a*b
e = c+d

variables = feed_dict={a: 5, b: 3}

with tf.Session() as sess:
    print("(a+b)+(a*b) = %i" % sess.run(e, variables))

There are more Tensorflow Operations

tf graphs

Creating a graph is simple:

import tensorflow as tf
g = tf.Graph()

Once the graph is initialized we can attach operation to it by using the Graph.as_default() method:

with g.as_default():
    a = tf.mul(2,3)

tf automatically creates a graph at the beginning and assigns it to be the default. Thus, if not using Graph.as_default() any operation will be automatically placed in the default graph.

Creating multiple graphs can be useful if you are defining multiple models that do not have interdependencies:

g1 = tf.Graph()
g2 = tf.Graph()

with g1.as_default():

with g2.as_default():

tf Variables

Tensor and Operation objects are immutable, but we need a mechanism to save changing values over time.

This is accomplished with Variable objects, which contain mutable tensor values that persist accross multiple calls to Session.run().

Variables can be used anywhere you might use a tensor.

tf has a number of helper operations to initialize variables: tf-zeros(), tf_ones(), tf.random_uniform(), tf.random_normal(), etc.

Variable objects live in a Graph but their state is managed by Session. Because of these they need an extra step for inicialization:

import tensorflow as tf

a = tf.Variable(3,name="my_var")
b = tf.add(5,a)

with tf.Session() as sess:

In order to chage the value of a Variable we can use the Variable.assign() method:

In [ ]:
import tensorflow as tf

a = tf.Variable(3,name="my_var")
b = a.assign(tf.mul(2,a))  # variables are objects, not ops.

# The statement a.assign(...) does not actually assign any to a, 
# but rather creates a tf.Operation that you have to explicitly 
# run to update the variable.

with tf.Session() as sess:
    print "a:",a.eval()   # variables are objects, not ops.
    print "b:", sess.run(b)
    print "b:", sess.run(b)
    print "b:", sess.run(b)

In [ ]:

a = tf.Variable(3,name="my_var")
b = a.assign(tf.mul(2,a))
with tf.Session() as sess:
    print a.eval()

In [ ]:

a = tf.Variable(3,name="my_var")
b = a.assign(tf.mul(2,a))
with tf.Session() as sess:
    print a.eval()

We can increment and decrement variables:

In [ ]:
import tensorflow as tf

a = tf.Variable(3,name="my_var")

with tf.Session() as sess:

Some classes of tf (f.e. Optimizer) are able to automatically change variable values without explicitely asking to do so.

Tensorflow sessions maintain values separately, each Session can have its own current value for a variable defined in the graph:

In [ ]:

a = tf.Variable(10)

sess1 = tf.Session()
sess2 = tf.Session()


print sess1.run(a.assign_add(10))
print sess2.run(a.assign_sub(2))


tf name scopes

tf offers a tool to help organize your graphs: name scopes.

Name scopes allows you to group operations into larger, named blocks. This is very usefu to visualize complex models with tensorboard.

In [ ]:
import tensorflow as tf

with tf.name_scope("Scope_A"):
    a = tf.add(1, 2, name="A_add")
    b = tf.mul(a, 3, name="A_mul")

with tf.name_scope("Scope_B"):
    c = tf.add(4, 5, name="B_add")
    d = tf.mul(c, 6, name="B_mul")

e = tf.add(b, d, name="output")

writer = tf.train.SummaryWriter('./name_scope_1', graph=tf.get_default_graph())

We can start tensorboard to see the graph: tensorboard --logdir="./name_scope_1".

You can expand the name scope boxes by clicking +.


Let's built and visualize and complex model:

  • Our inputs will be placeholders.
  • The model will take in a single vector of any lenght.
  • The graph will be segmented in name scopes.
  • We will accumulate the total value of all outputs over time.
  • At each run, we are going to save the output of the graph, the accumulated total of all outputs, and the average value of all outputs to disk for use in tensorboard.

In [ ]:
import tensorflow as tf
import numpy as np


# Explicitly create a Graph object
graph = tf.Graph()

with graph.as_default():
    with tf.name_scope("variables"):
# your code here

    # Primary transformation Operations
    with tf.name_scope("transformation"):
        # Separate input layer
        with tf.name_scope("input"):
# your code here

        # Separate middle layer
        with tf.name_scope("intermediate_layer"):
# your code here

        # Separate output layer
# your code here
    with tf.name_scope("update"):
        # Increments the total_output Variable by the latest output
# your code here

    # Summary Operations
    with tf.name_scope("summaries"):
        avg = tf.div(update_total, tf.cast(increment_step, tf.float32), name="average")
        # Creates summaries for output node
        tf.scalar_summary(b'Output', output, name="output_summary")
        tf.scalar_summary(b'Sum of outputs over time', update_total, name="total_summary")
        tf.scalar_summary(b'Average of outputs over time', avg, name="average_summary")
    # Global Variables and Operations
    with tf.name_scope("global_ops"):
        # Initialization Op
        init = tf.initialize_all_variables()    
        # Merge all summaries into one Operation
        merged_summaries = tf.merge_all_summaries()

# Start an interactive Session, using the explicitly created Graph
sess = tf.Session(graph=graph)

# Open a SummaryWriter to save summaries
writer = tf.train.SummaryWriter('./improved_graph', graph)

# Initialize Variables

Let's write a function to run the graph several times:

In [ ]:
def run_graph(input_tensor):
    Helper function; runs the graph with given input tensor and saves summaries
    feed_dict = {a: input_tensor}
    out, step, summary = sess.run([output, increment_step, merged_summaries], 
    writer.add_summary(summary, global_step=step)

In [ ]:
# Run the graph with various inputs

# Write the summaries to disk

# Close the SummaryWriter

# Close the session

To start TensorBoard after running this code, run the following command:

tensorboard --logdir='./improved_graph'